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EFFECTS  CF  SEECTBUM  SAI4PLING  ON 
SPEECH  INTELLIGIBILITY 


ANTHONY  E.  CASTELNOVO 

U.  S.  ARMY  BEHAVIOEIAL  SCIENCE  RESEARCH  lABCRATORY 
WASHINGTON,  D.  C.  2O5I5 


INTRODUCTION 


« 


* 


The  human  performance  experimentation  research  area  at  BESRL  is 
concerned  with  behavioral  functions  common  to  a  variety  of  Army  Jobs. 
Typical  of  this  is  the  research  in  the  Combat  Communications  unit 
which  is  concerned  with  finding  ways  to  inprove  the  performance  of 
the  human  operator  in  military  ccauuUiiications  systems.  One  of  the 
more  serious  problems  the  operator  faces  is  that  of  noise  which 
obscures  the  message.  The  noise  may  be  broad  band  noise  or  appear 
in  specific  bands  depending  on  the  source.  If  the  noise  appears  in 
relatively  narrow  bands  these  might  be  eliminated,  however,  there  are 
no  firm  data  on  which  we  can  estimate  the  effect  on  the  operator's 
performance  of  using  such  a  procedure.'^ This  study  was  designed  to 
gain  some  preliminary  information  about  the  effect  on  performance  of 
excising  portions  of  the  speech  spectrvim.  It  is  recognized  that 
sophisticated  techniques,  such  as  digital  transmission  of  voice,  are 
being  developed  and  employed  to  overcome  the  effects  of  noise  and  to 
transmit  communications  in  a  secure  form.  There  are,  however,  in¬ 
stances  where  these  techniques  are  not  feasible  and  it  may  be  useful 
to  reduce  the  amount  of  spectrum  dealt  with  by  excising  selected 
bands.  ( ).^ 


The  frequency  domain  of  speech  and  its  relation  to  intelligi¬ 
bility  has  been  the  subject  of  research  by  many  investigators  (1,2, 
These  investigators  have  measured  the  average  speech 
spectra  and  studied  the  effects  on  speech  intelligibility  of  elimi¬ 
nating  continuous  bands  from  the  upper  and  lower  areas  of  the  speech 
spectrum.  Another  way  of  treating  the  frequency  domain  is  to  remove 
bands  simultaneously  from  several  locations  in  the  speech  spectrum. 
Relatively  little  has  been  done  to  study  the  latter  possibility. 
Kryter  (7)  noted  that  if  several  narrow  bands  are  severely  reJecjEiy,  J 
the  Intelligibility  of  the  remaining  spectrum  is  greater  thaxL.tl^.. 
predicted  by  Articulation  Index  computations.  In  a  follow  '’***' 
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Kryter  (8)  observed  that  for  constant  speech  intelligibility  the 
total  "effective"  bandwidth  required  for  the  best  multiple  pass  band 
system  is  less  than  that  required  for  the  contiguous  pass  band  sys¬ 
tems  by  a  factor  of  2.  This  phenomena  may  be  explained  as  a  function 
of  redundancy.  Removing  some  narrow  bands  reduces  redundancy  though 
not  necessarily  intelligibility.  That  redundancy  is  a  characteristic 
of  the  speech  spectrum  and  that  some  reduction  may  be  made  without  a 
corresponding  reduction  in  Intelligibility  has  been  noted  before. 

M.  R.  Schroeder  (9)  briefly  reviews  the  work  of  Hcrcer  Dudley  and 
notes  his  contribution  to  the  origin  of  Vocoders  which  take  advantage 
of  the  redundancy  of  the  speech  spectrum. 

Apart  from  Kryter 's  work  which  employed  a  limited  amount  of 
filtering  there  appears  to  be  no  othe»*  relevant  work  in  the  litera¬ 
ture.  As  for  research  on  the  effect  of  noise  on  a  spectrum  composed 
of  discrete  bands,  there  appears  to  be  none  at  all.  The  present 
study  is  Intended  to  gain  preliminary  Information  about  the  effect  on 
intelligibility  due  to  the  number  euid  size  of  segments  which  are  ex¬ 
cised  and  the  effect  of  noise  on  the  intelligibility  of  a  speech 
spectrum  composed  of  discrete  bands.  Such  information  would  be  use¬ 
ful  in  assessing  the  feasibility  eliminatlxig  segments  of  the 
spectrum  which  may  carry  particularly  high  levels  of  noise  and  for 
employing  the  spectrum  space  In  the  interstices  for  otter  uses. 

PROCEDURE 

To  accomplish  the  sampling  of  the  spectrum,  a  set  of  2k  electri¬ 
cal  band-pass  filters  were  used  which  permitted  passing  very  narrow 
bands  and  which  had  very  steep  roll-off.  A  roll-off  of  27  dB,  from 
-3  to  -30  occurred  in  the  space  of  20  cycles  at  the  lower  frequencies 
and  in  about  35  cycles  at  the  higher  frequencies.  The  bandwidths^ 
of  the  individual  filters  at  -I6  dB  varied  from  ^0  to  II5  cycles. 

The  24  filters  allowed  a  total  bandwidth  of  I500  cycles,  from  375  Hz 
to  1684  Hz  to  be  passed.  Each  of  the  24  bandpass  filters  could  be 
switched  in  and  out  of  the  circuit  independently  of  the  other  filters. 
The  filter  set  was  Inserted  in  the  system  following  a  mixer  which 
mixed  the  speech  reproduced  by  an  Ampex^  tape  machine  with  noise 
frem  a  Grason  Stadler^  noise  generator  set  for  "speech"  shaping. 


^Tte  bandwidth  was  computed  for  the  -I6  dB  point  to  approximate  the 
effective  bandwidth  of  the  filter.  The  -3  dB  or  -6  dB  point  often 
used  in  specifying  the  filter  bandwidth  does  not  take  into  account 
the  intelligibility  contributed  by  the  filter  skirt  beyond  that 
point. 

^Use  or  mention  of  trade  names  is  solely  in  the  interest  of  preci¬ 
sion  of  reporting  procedures  and  does  not  constitute  indorsement 
by  USA  BESRL  or  the  Department  of  Army. 
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The  combined  etlxaulus  material  vas  amplified  by  a  Mclntosl^  amplifier 
which  vas  used  to  drive  -10^ headsets.  The  system  noise  vas 

dB  relative  to  the  rms  value  of  the  speech  (integrated  over 
seconds ) . 

The  stimulus  material  vas  presented  to  36  test  subjects.  These 
were  Army  enlisted  males  under  30  years  of  age,  hearing  category  I 
(a  hearing  test  at  the  time  of  the  experiment  shoved  that  all  had 
excellent  hearing)  with  no  language  problem  and  no  previous  cxperi' 
ence  in  the  communication  field.  The  subjects  were  located  in  an 
Industrial  Acoustics  Company^  series  1200  chamber  which  maintained 
a  very  low  level  of  ambient  noise. 

The  subjects  were  given  JO  hours  of  training  over  a  period  of  a 
week  (6  hours  a  day  for  J  days)  in  listening  to  2B  words  spoken  by 
the  three  speakers.  The  materials  had  been  subjected  to  filtering 
similar  to  that  used  in  the  experiment. 

Following  the  training  the  subjects  started  the  experimental 
sessions.  These  consisted  of  three  one-half  hour  sessions  on  each 
of  three  days.  Each  half-hour  experimental  session  vas  followed  by 
a  one-hour  rest  period.  Six  experimental  conditions  were  presented 
cj.ch  experimental  period.  Thus  over  the  nine  periods  a  total  of 
experimental  conditions  were  presented.  These  were  composed  of  iS 
filter  conditions  and  J  noise  conditions  used  with  each  filter  con¬ 
dition.  Figure  1  presents  the  filter  conditions.  Noise  Condition  1 
vas  zero  noise  from  the  noise  generator.  Noise  Condition  2  vas  dB 
below  the  maximum  rms  speech  amplitude  (integrated  over  .J  second). 
Noise  Condition  J  mixed  in  noise  at  IJ  dB  below  the  maximum  rms 
speech  amplitude  (integrated  over  .J  seconds). 

The  speech  material  consisted  of  eighteen  Ibrvard  IB  word  lists. 
(10)  which  had  been  recorded  by  three  talkers,  six  lists  by  each 
talker.  Each  siibject  vas  exposed  to  all  experimental  conditions. 

The  design  vas  such  that  all  subjects,  talkers,  and  word  lists 
were  associated  with  each  experimental  conation.  The  subjects  had 
been  instructed  to  respond  to  each  stimulus  word  regardless  of  bow 
vmsure  they  were;  and  except  for  a  /ery  few  instances  a  response  vas 
made  to  each  word. 


BEsuurs 

The  data  were  reduced  to  the  mean  values  for  each  of  the  ex¬ 
perimental  conditions.  These  data  are  shown  in  Table  1.  An  analysis 
of  variance  vas  made  for  the  main  effects  and  for  interaction  of  days. 


See  footnote  on  page  2. 
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Table  1 


COMPARISON  OF  EXPERIMEinCALLY  OBTAINED  IlfEELLIGIBILITIES 
WITH  THOSE  COMPILED  BY  USE  OF  TEE  ARTICULATION  INDEX  AT 
THREE  NOISE  I£VELS  AT  GIVEN  BANDWIDTHS 


Filter 

Config. 

Band- 

Width 

Obtain. 

Comp 

Obtain. 

Comp 

Obtain. 

Comp 

1 

1311 

59 

63 

51 

58 

37 

30 

2 

931 

59 

44 

43 

33 

29 

18 

5 

917 

58 

42 

43 

33 

29 

18 

4 

1028 

57 

50 

49 

38 

35 

21 

5 

1022 

55 

50 

44 

38 

30 

21 

6 

903 

55 

42 

39 

33 

29 

IB 

7 

906 

55 

42 

43 

33 

30 

18 

8 

750 

51 

35 

37 

27 

26 

15 

9 

817 

50 

40 

40 

30 

29 

16 

10 

739 

50 

35 

37 

26 

25 

14 

11 

688 

46 

32 

40 

25 

25 

14 

12 

667 

46 

30 

40 

24 

26 

12 

15 

577 

43 

25 

26 

18 

15 

11 

14 

478 

38 

20 

30 

15 

19 

9 

15 

600 

37 

27 

30 

20 

20 

11 

16 

815 

34 

35 

18 

26 

11 

14 

491 

29 

18 

20 

15 

14 

9 

18 

518 

27 

22 

22 

17 

14 

10 

blodcs  of  experimental  conditions,  period,  speaker,  filter  condi¬ 
tion  and  noise  conditions.  A,  small  significant  Improvement  was  found 
over  the  three  days  of  testing  even  though  the  experimental  sessions 
had  been  preceded  by  a  week  of  training.  This,  however,  did  not 
affect  the  results  as  each  of  the  ^  experimental  conditions  were 
replicated  12  times  on  each  of  the  three  days.  As  was  anticipated, 
the  filter  and  noise  factors  vere  statistically  significant  beyond 
the  .01  level.  Blocks  of  experimental  conditions,  periods,  and 
speakers  produced  non-slgnlf  leant  F-ratlos  and  the  interactions  of 
days  by  filter  conditions  and  days  by  noise  conditions  showed  a  prob¬ 
ability  of  occurrence  of  between  .10  and  .05.  The  filter  by  noise 
Interaction  was  not  significant  although,  as  we  note  below,  there  is 
a  slgnlflcsuit  change  in  the  relationship  between  bandwidth  and  Intel¬ 
ligibility  as  a  function  of  noise  level. 

The  Intelligibilities  produced  by  the  different  filter  configu¬ 
rations  were  also  compared  to  bandwidth  and  to  expected  iiitelligi- 
billty  for  a  contiguous  spectrum  computed  from  the  AI  (Articulation 


a 


/0  9- 


CASTEUIOVO 


Index).  (See  Table  1.)  Each  of  the  filter  configurations  used  vere 
fairly  representative  of  th<  total  I5OO  cycle  band.  The  filter  con- 
figiirations  were  designed  to  have  the  same  average  AI  per  cycle  as 
the  total  spectrum.  This  was  done  to  maintain  a  linear  relationship 
between  AI  and  bandwidth  (See  Figure  2)^  and  thus  avoid  confounding 
the  variation  in  intelligibility  resulting  from  the  amount  of  spec¬ 
trum  with  variations  in  intelligibility  which  might  be  caused  by 
using  spectra  concentrated  in  particular  areas  of  the  spectrum.  This 
was  done  even  though  for  the  area  of  the  spectrum  used  in  this  study 
this  was  not  very  critical.  The  loss  in  articulation  for  the  upper 
part  of  the  spectrum,  as  compared  to  the  lover  part,  is  not  great. 
This  is  reflected  by  the  values  for  configurations  I6  and  iS  as  shown 
in  Figure  2. 

Figures  3^  ®ncL  5  shew  the  intelligibility  data  for  the  iB 
filter  configurations  plotted  for  each  noise  level.  The  ordinate 
shows  percent  Intelligibility  and  the  abscissa  the  total  bandwidth  of 
the  filter  configuration.  The  data  for  each  noise  level  was  fitted 
(11)  by  a  parabola  of  the  form  Y  «  A  +  BX  +  CX*.  Sixteen  of  the  iB 
configurations  vex«  Included  in  the  array  fitted.  The  data  points 
for  configurations  I6  and  iB  vere  not  included  because  they  vere  not 
distributed  samplings  of  the  available  spectrum.  Configuration  IB 
Included  the  lover  31B  cycles  of  the  spectrum  and  configuration  16 
the  upper  813  cycles  approximately. 

The  redundancy  in  the  speech  spectrum  is  seen  as  the  curvature 
exhibited  by  the  data  in  Figures  3>  ^  and  3*  Figure  3;  vhich 
presents  the  data  for  the  lowest  noise  level,  reaches  a  maximum  of 
38s(  for  the  full  spectrum  of  I3OO  cycles  but  appears  to  have  reached 
an  asymptote  at  about  1100  cycles;  even  at  1000  cycles  the  fitted 
curve  does  not  show  much  loss.  Also,  the  empirically  obtained  values 
for  configurations  2,  3  end  4  for  noise  condition  1  are  not  signifi¬ 
cantly  lower  in  intelligibility  than  that  for  configuration  1 
(t'tests  gave  probabilities  between  .70  and  .80).  Figures  4  and 
which  present  the  data  for  Increased  levels  of  noise,  show  progres¬ 
sively  less  curvature.  The  curvature  coo^onent  for  noise  condition  1 
is  significant  at  the  .001  level.  For  noise  condition  2,  the  degree 
of  curvilinear Ity  is  less  ai^  reaches  significance  only  at  the  .03 
level  and  the  curvatvure  for  noise  conditim  3  appears  slight  and  does 
not  reach  significance  at  the  .03  level.  On  the  basis  of  these  com¬ 
parisons  it  appears  that  there  is  in  fact  an  Interaction  between 
xu>l6e  level  and  filter  configuration  which  was  not  apparent  from  the 
general  interaction  test. 

If  we  compare  configurations  16  and  iB,  vhich  are  exainples  of 
massed  bandvldths  with  configurations  9  &i3d  l4,  vhich  are  comparable 
in  AI  and  bandwidth  but  vhich  are  distributed  over  the  spectrum,  ve 
see  that  the  distributed  configurations  produce  a  significantly 
higher  level  of  intelligibility.  (See  Table  2.)  This  relationship 
holds  for  eJLl  three  noise  levels.  Distributed  configuration  4  has  a 


/O'  s 


CASTEI2J0V0 


bandwidth  of  1028  cycles,  which  is  substantially  less  than  that  of 
massed  configuration  1,  but  the  two  give  almost  the  same  level  of 
intelligibility  at  each  of  the  three  noise  levels. 


Table  2 

RESUIT3  OF  t-TJSST  COMPARISONS  CF  SEIECTEO)  MEANS 


Filter 

Noise 

Level  Condition 

Configurations 

1 

2 

5 

1  vs  4 

.72 

.59 

.66 

9  vs  16 

5.65* 

8.40* 

6.66* 

14  vs  17 

4.0^ 

5.64* 

2.47* 

14  vs  IB 

4.28‘ 

5.20* 

2.24^ 

^Significant  at  the  .01  level 
^Significant  at  the  .0^  level 


Using  the  Articulation  Index  coaputed  for  each  filter  configura¬ 
tion  and  referring  to  the  typical  illations  hip  between  Articulation 
Index  and  Intelligibility  of  IB  words  (Figure  7  Reference  4)  the 
eigpected  intelligibility  was  coaputed  for  these  filter  configurations 
and  plotted  in  Figure  3  (shown  as  the  dashed  line).  These  values 
approximate  a  straight  line.  Using  these  coaputed  intelligibilities 
as  a  reference  we  see  that  the  experiocntally  obtained  intelligi¬ 
bilities  for  the  configurations  with  undistributed  bandwidtha ,  points 
l6  and  IB,  approximate  what  would  be  expected  for  this  amount  of 
bandwidth.  The  configurations  with  distributed  bandwidths  produce 
comparatively  higher  intelligibilities.  There  seems  to  be  no  appar¬ 
ent  reason  for  the  discrepancy  between  obtained  and  computed  intel¬ 
ligibility  for  configuration  1. 

Configurations  16,  17,  and  iB  which  are  the  poorest  (least  well 
distributed)  samplings  of  the  6i>cctnmi  also  yield  the  lowest  intel¬ 
ligibilities.  As  shown  in  Figure  1  they  leave  the  largest  areas  of 
the  spectrum  unsampled.  Configurations  l6  and  l8,  as  has  been  noted, 
are  each  composed  of  a  single  pass  band.  Configuration  17  consists 
of  two  pass  bands,  one  at  each  end  of  the  spectrum,  which  leaves  a 
gap  of  820  cycles  in  the  center  of  the  spectrum. 

CONCLUSIONS 

Segments  of  the  speech  spectrum  my  be  excised  under  conditions 
of  low  noise  without  incurring  a  proportionate  reduction  in  intel¬ 
ligibility  as  we  might  have  expected  for  this  spectrum  had  we  used 
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high  or  lov  pass  filtering.  Differences  in  size  and  nuinber  of  seg¬ 
ments  excised,  if  the  size  of  tbe  excisions  is  not  large,  do  not 
appear  to  differentially  affect  intelligibility.  We  see  in  Figures 
3,  and  ^  that  configurations  vith  different  sized  segments  excised 
but  vith  approximately  ec^ulvalent  bandvidths  are  of  approximately 
eq.ual  intelligibility.  This  is  best  illustrated  by  the  cluster 
formed  by  configurations  2,  6,  and  7  vhich  are  con^posed  of  defer¬ 

ent  numbers  of  samples  and  have  bad  different  sized  segments  excised. 
Hovever,  if  tbe  size  of  the  excision  becomes  a  relatively  large  con¬ 
tiguous  portion  of  the  total  spectrvun,  as  is  tbe  case  vith  configura¬ 
tion  17/  excising  several  smaller  segments,  as  vas  ~.aae  in  configura¬ 
tion  lA,  results  in  a  significantly  higher  level  of  intelligibility 
for  a  given  total  amount  of  bandvidth.  Thus,  it  appears  that  bands 
of  about  200  cycles  may  be  excised  vith  no  greater  loss  than  vould 
result  from  excising  an  equivalent  amount  as  a  larger  number  of 
smaller  segments.  Under  some  conditions  relatively  large  bands  of 
tbe  spectrum  muy  be  excised  vlthout  a  significant  loss  in  intelJ.igl- 
bility. 

It  seems  reasonable  to  expect  that  seme  gain  in  intelligibility 
may  be  made  by  excising  narrov  bands  of  tbe  spectrum  vhich  have  a 
poor  speech-to-nolse  ratio.  The  huma.i  auditory  system  can  be  charac¬ 
terized  as  a  filter  vith  relatively  slow  roll  off,  that  is,  a  sound 
from  one  part  of  the  spectrum  can  interfere  vith  tbe  perception  of  a 
sound  from  another  part  of  tbe  spectrum.  This  spread  of  masking,  as 
it  is  called,  can  cause  relatively  narrov  bzmds  of  noise  from  one 
part  of  the  spectrum  to  Interfere  vith  the  perception  of  speech  vhich 
lies  in  other  parts  of  the  spectrum.  Future  studies  viU  be  con¬ 
cerned  vith  the  effect  on  the  operator's  performance  caused  by  re¬ 
moving  portions  of  the  spectrum  vhich  carry  high  levels  of  noise  and 
consequently  give  rise  to  masking  of  adjacent  spectrua  areas. 
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Figure  1.  Representation  of  the  filter  configurations. 
(Lines  Indicate  pass  bands) 
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Figure  2.  Relationship  between  Articulation  Index  and  total 
bandwidth.  (The  data  points  are  the  filter 
configurations  ) 
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Figure  5*  Comparison  of  experimentally  obtained  inte.lligi 
bllities  for  noise  condition  1  vith  computed 
intelligibilities 
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Figure  4.  Experimentally  obtained  intelligibilities  for 
noise  condition  2 
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Figure  5.  Experimentally  obtained  intelligibilities  for 
noise  condition  3 
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